DOC: Fix DataFrame.to_xarray doctests and allow the CI to run it. #22673

Moisan · 2018-09-12T00:42:11Z

tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff

Fix the docstring for DataFrame.to_xarray as described in #22459 . I removed the pandas.panel example since it was deprecated and I replaced it with a MultiIndex.

pep8speaks · 2018-09-12T00:42:16Z

Hello @Moisan! Thanks for updating the PR.

There are no PEP8 issues in the file pandas/core/generic.py !

Comment last updated on September 14, 2018 at 20:39 Hours UTC

gfyoung · 2018-09-12T00:59:42Z

pandas/core/generic.py

+        Return the xarray equivalent of the pandas object. `xarray
+        <http://xarray.pydata.org/en/stable/>`__ is a
+        Python package that allows to handle N-dimensional data.
+


I'm not sure if we need to mention this here.

I was trying to extend the previous short summary. Do you suggest to simply remove it or is there something else we should mention here?

I think we can remove it. We describe xarray elsewhere in the docs anyhow. You don't always have to put an extension of the short summary.

codecov · 2018-09-12T09:28:22Z

Codecov Report

Merging #22673 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #22673   +/-   ##
=======================================
  Coverage   92.17%   92.17%           
=======================================
  Files         169      169           
  Lines       50708    50708           
=======================================
  Hits        46740    46740           
  Misses       3968     3968

Flag	Coverage Δ
#multiple	`90.58% <ø> (ø)`	⬆️
#single	`42.35% <ø> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/generic.py	`96.67% <ø> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 73dd6ec...a7ecbb2. Read the comment docs.

gfyoung

cc @jreback @datapythonista @WillAyd

jreback · 2018-09-12T11:23:55Z

pandas/core/generic.py

-          * items       (items) object 'A' 'B' 'C' 'D'
-          * major_axis  (major_axis) datetime64[ns] 2013-01-01 2013-01-02 2013-01-03  # noqa
-          * minor_axis  (minor_axis) object 'first' 'second'
+          * first    (first) object 'bar' 'baz' 'foo' 'qux'


would be better to have a datetime index for 1 level

datapythonista

Thanks for the changes @Moisan

I added some comments about the original docstring, that if you implement them, I think the examples won't only pass the tests, but will be much clearer.

datapythonista · 2018-09-12T11:06:17Z

pandas/core/generic.py

@@ -2498,11 +2498,15 @@ def to_xarray(self):
        a Dataset for a DataFrame
        a DataArray for higher dims


Do you mind replacing this to the standard format? Only the type in the first line, and a description in the next. For example:

xarray.DataArray or xarray.Dataset Data in the pandas structure converted to Dataset if the object is a DataFrame, or a DataArray if the object is a Series.

datapythonista · 2018-09-12T11:09:02Z

pandas/core/generic.py

@@ -2498,11 +2498,15 @@ def to_xarray(self):
        a Dataset for a DataFrame
        a DataArray for higher dims

+        See also
+        --------
+        DataFrame.to_csv : Write out to a csv file.


See Also is with a capital A.

Personally I don't like "recommending" to_csv here, as besides not being a great format, it does not support multidimensional data. I think to_parquet and to_hdf seem more appropriate to me.

datapythonista · 2018-09-12T11:11:56Z

pandas/core/generic.py

-                             ).set_index(['B','A'])
+        ...                    'B' : ['foo', 'bar', 'foo'],
+        ...                    'C' : np.arange(4.,7)}
+        ...                   ).set_index(['B','A'])


I don't see the need to repeat the previous DataFrame, we can just have df_multiindex = df.set_index(['B', 'A'])

datapythonista · 2018-09-12T11:23:09Z

pandas/core/generic.py

-                               'B' : ['foo', 'bar', 'foo'],
-                               'C' : np.arange(4.,7)})
+        ...                    'B' : ['foo', 'bar', 'foo'],
+        ...                    'C' : np.arange(4.,7)})


I think using arange makes the code more complicated for no reason. Using [4., 5., 6.] is simpler and clearer I think.

Also, it'd be good to have a meaningful example. The code in the example is difficult to follow as there is no way to know that the object column is B, more than looking at the example. If we get something few samples with animals with name, num_legs, speed, it's obvious which is the float, which the int, and which the str/object.

datapythonista

Looks really great. Just a couple of comments that if I'm right, should simplify the examples.

datapythonista · 2018-09-13T20:39:15Z

pandas/core/generic.py

+        ...                    ('monkey', 'mammal', np.nan, 4)],
+        ...                    columns=['name', 'class', 'max_speed',
+        ...                     'num_legs'],
+        ...                    index=[0, 2, 3, 1])


Any reason why we don't use the default index (so we don't specify it), or we specify one sorted? May be I'm missing the point, but seems like this should have a meaning, but couldn't see with the rest of the example. If there is no reason (may be you just copied from an example where this was for something?), I'd just remove it, so we save some space and avoid distractions.

The indentation of the num_legs seems wrong, I think it should be indented to the level of name. When possible we'll start validating automatically PEP8 in the examples, so if we can get this fixed already, that would be great.

datapythonista · 2018-09-13T21:05:11Z

pandas/core/generic.py

+        ...   names=['first', 'second'])
+
+        >>> s = pd.Series(np.arange(8), index=index)
+        >>> s


I find this too complicated for what we need to show. To have a Series with a multiindex with a datetime level, we can have something like:

import pandas as pd df = pd.DataFrame({'date': pd.to_datetime(['2018-01-01', '2018-01-01', '2018-01-02', '2018-01-02']), 'animal': ['falcon', 'parrot', 'falcon', 'parrot'], 'speed': [350, 18, 361, 15]}).set_index(['date', 'animal']) df['speed']

I haven't used much xarray myself, and not sure what makes sense to show here. May be:

Series.to_xarray()

DataFrame.to_xarray()

DataFrame(with multiindex including datetime).to_xarray()

If that makes sense, I think with the first example, we can have df.to_xarray() and df['max_speed'].to_xarray(), and then a example like the one I wrote.

@jreback does this make sense?

Sorry for requesting the changes @Moisan, but my I find like the current version gives the idea that we're trying to show something more complex than what we are actually showing.

No problem, I'm happy to make the examples more relevant :).

datapythonista

lgtm, thanks @Moisan

@jreback I think you comments should be addressed properly, let us know if that's not the case

TomAugspurger · 2018-09-19T15:37:38Z

Thanks!

…ndas-dev#22673)

DOC: Fix DataFrame.to_xarray doctests and allow the CI to run it.

670e768

gfyoung added Docs DataFrame DataFrame data structure labels Sep 12, 2018

gfyoung reviewed Sep 12, 2018

View reviewed changes

Remove extended summary from to_xarray docstring

a7ecbb2

gfyoung approved these changes Sep 12, 2018

View reviewed changes

jreback reviewed Sep 12, 2018

View reviewed changes

datapythonista reviewed Sep 12, 2018

View reviewed changes

datapythonista added the IO Data IO issues that don't fit into a more specific label label Sep 12, 2018

Refactor the examples and the See Also section of to_xarray docstring

ce5098a

datapythonista reviewed Sep 13, 2018

View reviewed changes

Thierry Moisan and others added 2 commits September 14, 2018 16:39

Simplify to_xarray doctests

08561d2

Merge branch 'master' into docstring_to_xarray

93edaca

datapythonista approved these changes Sep 18, 2018

View reviewed changes

TomAugspurger merged commit d923385 into pandas-dev:master Sep 19, 2018

Moisan deleted the docstring_to_xarray branch September 19, 2018 17:21

aeltanawy pushed a commit to aeltanawy/pandas that referenced this pull request Sep 20, 2018

DOC: Fix DataFrame.to_xarray doctests and allow the CI to run it. (pa…

27ea656

…ndas-dev#22673)

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018

DOC: Fix DataFrame.to_xarray doctests and allow the CI to run it. (pa…

d020a44

…ndas-dev#22673)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: Fix DataFrame.to_xarray doctests and allow the CI to run it. #22673

DOC: Fix DataFrame.to_xarray doctests and allow the CI to run it. #22673

Moisan commented Sep 12, 2018 •

edited

Loading

pep8speaks commented Sep 12, 2018 •

edited

Loading

gfyoung Sep 12, 2018

Moisan Sep 12, 2018

gfyoung Sep 12, 2018 •

edited

Loading

codecov bot commented Sep 12, 2018

gfyoung left a comment

jreback Sep 12, 2018

datapythonista left a comment

datapythonista Sep 12, 2018

datapythonista Sep 12, 2018

datapythonista Sep 12, 2018

datapythonista Sep 12, 2018

datapythonista left a comment

datapythonista Sep 13, 2018

datapythonista Sep 13, 2018

Moisan Sep 14, 2018

datapythonista left a comment

TomAugspurger commented Sep 19, 2018

		@@ -2498,11 +2498,15 @@ def to_xarray(self):
		a Dataset for a DataFrame
		a DataArray for higher dims

DOC: Fix DataFrame.to_xarray doctests and allow the CI to run it. #22673

DOC: Fix DataFrame.to_xarray doctests and allow the CI to run it. #22673

Conversation

Moisan commented Sep 12, 2018 • edited Loading

pep8speaks commented Sep 12, 2018 • edited Loading

Comment last updated on September 14, 2018 at 20:39 Hours UTC

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gfyoung Sep 12, 2018 • edited Loading

Choose a reason for hiding this comment

codecov bot commented Sep 12, 2018

Codecov Report

gfyoung left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

datapythonista left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

datapythonista left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

datapythonista left a comment

Choose a reason for hiding this comment

TomAugspurger commented Sep 19, 2018

Moisan commented Sep 12, 2018 •

edited

Loading

pep8speaks commented Sep 12, 2018 •

edited

Loading

gfyoung Sep 12, 2018 •

edited

Loading